Computer Vision for Music Identification: Video Demonstration
نویسندگان
چکیده
This paper describes a demonstration video for our music identification system [5]. The goal of music identification is to reliably recognize a song from a small sample of noisy audio. For instance, a user may wish to identify the music playing on her car radio or in the background at a party. She could send a few seconds of the audio using her mobile phone to a music identification server and receive a text message with the title of the song. This problem is challenging because the recording is often corrupted by noise and because the audio sample will only match a small portion of the target song. Additionally, a practical music identification system should scale (in both accuracy and speed) to databases containing hundreds of thousands of songs. Recently, the music identification problem has attracted considerable attention [1]–[4]. However, the task remains unsolved, particularly for noisy real-world queries. At first glance, problems in the audio domain may appear to have little relevance to computer vision. The former deals with processing 1-D signals over time, while computer vision tends to focus on the interpretation of one or more 2-D images (typically captured from a 3-D scene). However, we believe that certain problems in the audio domain transform very naturally into a form that can be effectively tackled by computer vision techniques. This belief is motivated by the observation that audio researchers commonly employ 2-D time-frequency representations, such as spectrograms, when analyzing sound or speech. We cast music identification into an equivalent sub-image retrieval framework: identify the portion of a spectrogram image from the database that best matches a given query snippet. Our approach treats the spectrogram of each music clip as a 2-D image and transforms music identification into a corrupted sub-image retrieval problem. By employing pairwise boosting on a large set of Viola-Jones features [6], our system learns compact, discriminative, local descriptors that are amenable to efficient indexing. During the query phase, we retrieve the set of song snippets that locally match the noisy sample and employ geometric verification in conjunction with an EM-based “occlusion” model to identify the song that is most consistent with the observed signal. We have implemented our algorithm in a practical system that can Fig. 1
منابع مشابه
Extracting Coarse Body Movements from Video in Music Performance: A Comparison of Automated Computer Vision Techniques with Motion Capture Data
Citation: Jakubowski K, Eerola T, Alborno P, Volpe G, Camurri A and Clayton M (2017) Extracting Coarse Body Movements from Video in Music Performance: A Comparison of Automated Computer Vision Techniques with Motion Capture Data. Front. Digit. Humanit. 4:9. doi: 10.3389/fdigh.2017.00009 extracting coarse Body Movements from Video in Music Performance: a comparison of automated computer Vision T...
متن کاملComparison of Video-Based Instruction and Instructor Demonstration on Learning of Practical Skills in Nursing Students
Introduction: Since technology has an important role in the improvement of educational quality, finding better methods of teaching and learning and improving equipment and teaching materials is emphasized. Regarding this, two educational methods- presentation by the instructor and video presentation, were offered and their effectiveness on nursing students’ learning skills was compared. Method...
متن کاملA Novel Approach to Background Subtraction Using Visual Saliency Map
Generally human vision system searches for salient regions and movements in video scenes to lessen the search space and effort. Using visual saliency map for modelling gives important information for understanding in many applications. In this paper we present a simple method with low computation load using visual saliency map for background subtraction in video stream. The proposed technique i...
متن کاملAction Change Detection in Video Based on HOG
Background and Objectives: Action recognition, as the processes of labeling an unknown action of a query video, is a challenging problem, due to the event complexity, variations in imaging conditions, and intra- and inter-individual action-variability. A number of solutions proposed to solve action recognition problem. Many of these frameworks suppose that each video sequence includes only one ...
متن کاملارزیابی یک سیستم بینایی ماشین از راه اندازهگیری و تخمین شماری از ویژگیهای فیزیکی پسته
In order to increase the role of machine vision in agricultural research in Iran, especially for measuring physical attributes of seeds, a machine vision system was developed using a computer, a capture card, a video camera and a light box. All equipment was purchased from domestic markets. Computer programs were developed for hardware setup and for image processing applications. The programs p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005